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Abstract 

It  has  been  proved  to  be  extremely  challeng¬ 
ing  for  humans  to  program  a  robot  to  such 
a  sufficient  degree  that  it  acts  properly  in  a 
typical  unknown  human  environment.  This 
is  especially  true  for  a  humanoid  robot  due 
to  the  very  large  number  of  redundant  de¬ 
grees  of  freedom  and  a  large  number  of  sen¬ 
sors  that  are  required  for  a  humanoid  to  work 
safely  and  effectively  in  the  human  environ¬ 
ment.  How  can  we  address  this  fundamental 
problem?  Motivated  by  human  mental  devel¬ 
opment  from  infancy  to  adulthood,  we  present 
a  theory,  an  architecture,  and  some  experi¬ 
mental  results  showing  how  to  enable  a  robot 
to  develop  its  mind  automatically,  through 
online,  real  time  interactions  with  its  envi¬ 
ronment.  Humans  mentally  “raise”  the  robot 
through  “robot  sitting”  and  “robot  schools” 
instead  of  task-specific  robot  programming. 

1.  Introduction 

The  conventional  mode  of  developmental  process  for 
a  robot  is  not  automatic  -  a  human  designer  is  in  the 
loop.  A  typical  process  goes  like  this:  Given  a  robotic 
task,  it  is  the  human  designer  who  analyzes  and  un¬ 
derstands  the  task.  Based  on  his  understanding,  he 
comes  up  with  a  representation,  chooses  a  compu¬ 
tational  method,  and  writes  a  program  that  imple¬ 
ments  his  method  for  the  robot.  The  representation 
reflects  very  much  the  human  designer’s  understand¬ 
ing  of  the  robot  task.  During  this  developmental  pro¬ 
cess,  some  machine  learning  might  be  used,  during 
which  some  parameters  are  adjusted  according  to  the 
collected  data.  However,  these  parameters  are  de¬ 
fined  by  the  human  designer’s  representation  for  the 
given  task.  The  resulting  program  is  for  this  task 
only,  not  for  any  other  tasks.  If  the  robotic  task  is 
complex,  the  capability  of  handling  variation  of  envi¬ 
ronment  is  very  much  limited  by  the  human  designed 
task-specific  representation.  This  manual  develop¬ 
ment  paradigm  has  met  tremendous  difficulties  for 
tasks  that  require  complex  cognitive  and  behavioral 
capabilities,  such  as  many  sensing  and  behavioral 


skills  that  a  humanoid  must  have  in  order  to  execute 
human  high-level  commands,  including  autonomous 
navigation,  object  manipulation,  object  delivery,  tar¬ 
get  finding,  human-robot  interaction  through  ges¬ 
ture  in  unknown  environment.  The  high  degree  of 
freedom,  the  redundant  manipulators,  and  the  large 
number  of  effectors  that  a  humanoid  has,  plus  the 
multimodal  sensing  capabilities  that  are  required  to 
work  with  humans  further  increase  the  above  diffi¬ 
culties.  The  complex  and  changing  nature  of  human 
environment  has  made  the  issue  of  autonomous  men¬ 
tal  development  of  robots  —  the  way  human  mind 
develops  —  more  important  than  ever. 

Many  robotics  researchers  may  believe  that  hu¬ 
man  brain  has  an  innate  representation  for  the  tasks 
that  humans  generally  do.  However,  recent  stud¬ 
ies  of  brain  plasticity  have  shown  that  our  brain  is 
not  as  task-specific  as  commonly  believed.  There  ex¬ 
ist  rich  studies  of  brain  plasticity  in  neuroscience, 
from  varying  extent  of  sensory  input,  redirecting 
input,  transplanting  cortex,  to  lesion  studies,  and 
sensitive  periods.  Redirecting  input  seems  illumi¬ 
nating  in  explaining  how  much  task-specific  our 
brain  really  is.  For  example,  Mriganka  Sur  and 
his  coworkers  rewired  visual  input  to  primate  au¬ 
ditory  cortex  early  in  life.  The  target  tissue  in 
the  auditory  cortex,  which  is  supposed  to  take  au¬ 
ditory  representation,  was  found  to  take  on  visual 
representation  instead  (Sur  et  al.,  1999).  Further¬ 
more,  they  have  successfully  trained  the  animals  to 
form  visual  tasks  using  the  rewired  auditory  cor¬ 
tex  (von  Melchner  et  al.,  2000).  Why  are  the  self¬ 
organization  schemes  that  guide  development  in  our 
brain  so  general  that  they  can  deal  with  either  speech 
or  vision,  depending  on  what  input  it  takes  through 
the  development?  Why  are  robots  that  are  pro¬ 
grammed  using  human  designed,  task-specific  rep¬ 
resentation  do  not  do  well  in  complex,  changing,  or 
partially  unknown  or  totally  unknown  environment? 
What  are  the  self-organization  schemes  that  robots 
can  use  to  autonomously  develop  their  mental  skills 
through  interactions  with  the  environment?  Is  it 
more  advantageous  to  enable  robots  to  autonomously 
develop  their  mental  skills  than  to  program  robots 
using  human-specified,  task-specific  representation? 
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Although  robot  mental  development  is  very  much 
a  new  concept  (Weng  et  al.,  2000b),  a  lot  of  well- 
known  self-organization  tools  can  be  used  in  design¬ 
ing  a  developmental  robot.  In  this  paper,  we  sum¬ 
marize  our  recent  investigations  on  this  new  direc¬ 
tion  and  hopefully  provide  some  answers  to  the  above 
questions.  In  the  following  sections,  we  first  outline 
the  previous  and  current  projects  related  to  robot 
mental  development  conducted  in  our  group.  Then  a 
theory  of  autonomous  mental  development  of  robots 
is  presented  followed  by  the  experimental  results  on 
the  SAIL  robot,  a  developmental  robot  constructed 
following  this  theory.  A  brief  comparison  to  others’ 
work  is  given  before  we  draw  the  conclusion. 

2.  An  outline  of  previous  and  current 
projects 

Our  decade-long  effort  in  enabling  a  machine  to  grow 
its  perceptual  and  behavioral  capabilities  has  gone 
through  four  systems:  Cresceptron  (1991  -  1995), 
SHSOLIF  (1993  -  2000),  SAIL  (1996  -  present  )  and 
Dav  (1999  -  present). 

Cresceptron  is  an  interactive  software 
system  for  visual  recognition  and  segmenta¬ 
tion  (Weng  et  al.,  1997).  The  major  contribution 
is  a  method  to  automatically  generate  (grow)  a 
network  for  recognition  from  training  images.  The 
topology  of  this  network  is  a  function  of  the  content 
of  the  training  images.  Due  to  its  general  nature 
in  representation  and  learning,  it  turned  out  to  be 
one  of  the  first  systems  that  have  been  trained  to 
recognize  and  segment  complex  objects  of  very  dif¬ 
ferent  natures  from  natural,  complex  backgrounds. 
Although  Cresceptron  is  a  general  developmental 
system,  its  efficiency  is  low. 

SHOSLIF  (Self-organizing  Hierarchical  Optimal 
Subspace  Learning  and  Inference  Framework)  was 
the  next  project  whose  goal  was  to  resolve  the  ef¬ 
ficiency  of  self-organization.  It  automatically  finds 
a  set  of  Most  Discriminating  Features  (MDF)  using 
Principle  Component  Analysis  (PCA)  followed  by 
Linear  Discriminant  Analysis  (LDA) ,  for  better  gen¬ 
eralization.  It  is  a  hierarchical  structure  organized  by 
a  tree  to  reach  a  logarithmic  time  complexity.  Using 
it  in  an  observation-driven  Markov  Decision  Process 
(ODMDP),  SHOSLIF  has  successfully  controlled  the 
ROME  robot  to  navigate  in  MSU’s  large  Engineering 
Building  in  real-time  using  only  video  cameras,  with¬ 
out  using  any  range  sensors  (Weng  and  Chen,  1998). 
All  the  real-time  computing  was  performed  by  a 
slow  Sun  SPARC  Ultra-1  Workstation.  Therefore, 
SHOSLIF  is  very  efficient  for  real-time  operation. 
However,  it  is  not  an  incremental  learning  method. 

SAIL  (Self-organizing,  Autonomous,  Incremental 
Learner)  is  the  next  generation  after  SHOSLIF.  The 
objective  of  this  project  is  to  automate  the  real- 


Figure  1:  The  SAIL  (left)  and  Dav  (right)  robot. 


time  incremental  development  for  robot  perceptual 
and  behavioral  capabilities.  The  internal  represen¬ 
tation  of  the  SAIL  robot  (Fig.  1)  is  generated  au¬ 
tomatically  by  the  robot  itself,  starting  with  a  de¬ 
sign  of  a  coarse  architecture.  A  self-organization 
engine  called  Incremental  Hierarchical  Discrimi¬ 
nant  Regression  (IHDR)  was  the  critical  technology 
that  achieves  the  stringent  real-time,  incremental, 
small  sample  size,  large  memory,  and  better  gen¬ 
eralization  requirements  (Hwang  and  Weng,  2000a) 
(Hwang  and  Weng,  2000b).  IHDR  automatically 
and  incrementally  grows  and  updates  a  tree  (net¬ 
work)  of  nodes  (remotely  resemble  cortical  areas). 
In  each  node  is  an  incrementally  updated  feature 
subspace,  derived  from  the  most  discriminating  fea¬ 
tures  for  better  generalization.  Discriminating  fea¬ 
tures  disregard  factors  that  are  not  related  to  percep¬ 
tion  or  actions,  such  as  lighting  in  object  recognition 
and  autonomous  navigation. 


Dav  robot  (Fig.  1)  is  a  humanoid  robot,  currently 
being  developed  as  a  next-generation  test-bed  for 
experimental  investigations  into  autonomous  men¬ 
tal  development  (Han  et  al.,  2002).  This  general- 
purpose  humanoid  platform  consists  of  a  total  of 
43  degrees  of  freedom  (DOF),  including  drive  base, 
torso,  arms,  hands,  neck  and  head.  The  body  may 
support  a  wide  array  of  locomotive  and  manipulative 
behaviors.  For  perception,  Dav  is  equipped  with  a 
variety  of  sensing  systems,  including  visual,  auditory 
and  haptic  sensors.  Its  computational  resource  is  to¬ 
tally  onboard,  including  quadruple  Pentium  III  plus 
PowerPCs,  large  memory  and  storage,  networks,  and 
long-sustenance  power  supply. 


sensors 


Figure  2:  The  abstract  model  of  a  traditional  agent, 
which  perceives  the  external  environment  and  acts  on  it 
(adapted  from  (Russell  and  Norvig,  1995)).  The  source 
of  perception  and  the  target  of  action  do  not  include  the 
agent  brain  representation. 

3.  A  theory  for  mentally  developing 
robots 

Evolving  with  the  above  robot  projects  is  a  theoretic 
framework  for  autonomous  mental  development  of 
robots.  We  present  the  major  components  of  this 
theory  here.  For  more  details,  the  reader  is  referred 
to  (Weng,  2002). 

3.1  SASE  Agents 

Defined  in  the  standard  AI  literature  (see,  e.g.,  an  ex¬ 
cellent  textbook  (Russell  and  Norvig,  1995)  and  an 
excellent  survey  (Franklin,  1997)),  an  agent  is  some¬ 
thing  that  senses  and  acts,  whose  abstract  model  is 
shown  in  Fig.  2.  As  shown,  the  environment  E  of  an 
agent  is  the  world  outside  the  agent. 

To  be  precise  in  our  further  discussion,  we  need 
some  mathematical  notation.  A  context  of  an  agent 
is  a  stochastic  process  (Papoulis,  1976),  denoted  by 
g(t).  It  consists  of  two  parts  g(t)  =  (x(t),a(t)), 
where  x(t)  denotes  the  sensory  vector  at  time  t  which 
collects  all  signals  (values)  sensed  by  the  sensors  of 
the  agent  at  time  t,  a(t)  the  effector  vector  consist¬ 
ing  of  all  the  signals  sent  to  the  effectors  of  the  agent 
at  time  t.  The  context  of  the  agent  from  the  time  t\ 
(when  the  agent  is  turned  on)  up  to  a  later  time  1 2  is  a 
realization  of  the  random  process  {g(t)  \  t\  <  t  <  1 2}. 
Similarly,  we  call  {x(t)  \  t\  <  t  <  t%}  a  sensory  con¬ 
text  and  { a(t)  |  t\  <  t  <  £2}  an  action  context. 

The  set  of  all  the  possible  contexts  of  an  environ¬ 
ment  E  is  called  the  context  domain  V.  As  indicated 
by  Fig.  2,  at  each  time  t,  the  agent  senses  vector  x(t) 
from  the  environment  using  its  sensors  and  it  sends 
a(t)  as  action  to  its  effectors.  Typically,  at  any  time 
t,  the  agent  uses  only  a  subset  of  the  history  repre¬ 
sented  in  the  context,  since  only  a  subset  is  mostly 
related  to  the  current  action. 

The  model  in  Fig.  2  is  for  an  agent  that  perceives 
only  the  external  environment  and  acts  on  the  exter¬ 
nal  environment.  Such  agents  range  from  a  simple 
thermostat  to  a  complex  space  shuttle.  This  well  ac¬ 
cepted  model  played  an  important  role  in  agent  re¬ 
search  and  applications.  Unfortunately,  this  model 


Figure  3:  A  self-aware  self-effecting  (SASE)  agent.  It 
interacts  with  not  only  the  external  environment  but  also 
its  own  internal  (brain)  environment:  the  representation 
of  the  brain  itself. 

has  a  fundamental  flaw:  It  does  not  sense  its  internal 
“brain”  activities.  In  other  words,  its  internal  deci¬ 
sion  process  is  neither  a  target  of  its  own  cognition 
nor  a  subject  for  the  agent  to  explain. 

The  human  brain  allows  the  thinker  to  sense 
what  he  is  thinking  about  without  performing  an 
overt  action.  For  example,  visual  attention  is  a 
self-aware  self-effecting  internal  action  (see,  e.g., 
(Kandel  et  al.,  1991),  pp.  396  -  403).  Motivated  by 
neuroscience,  it  is  proposed  here  that  a  highly  in¬ 
telligent  being  must  be  self-aware  and  self-effecting 
(SASE).  Fig.  3  shows  an  illustration  of  a  SASE  agent. 
A  formal  definition  of  a  SASE  agent  is  as  follows: 

Definition  1  A  self-aware  and  self-effecting 
(SASE)  agent  has  internal  sensors  and  internal 
effectors.  In  addition  to  interacting  with  the  ex¬ 
ternal  environment,  it  senses  some  of  its  internal 
representation  as  a  part,  of  its  perceptual  process  and 
it  generates  actions  for  its  internal  effectors  as  a 
part  of  its  action  process. 

Using  this  new  agent  model,  the  sensory  context  x(t) 
of  a  SASE  agent  must  contain  information  about  not 
only  external  environment  E,  but  also  internal  rep¬ 
resentation  R.  Further,  the  action  context  a(t)  of  a 
SASE  agent  must  include  internal  effectors  that  act 
on  R. 

A  traditional  non-SASE  agent  does  use  internal 
representation  R  to  make  decision.  However,  this 
decision  process  and  the  internal  representation  R  is 
not  included  in  what  is  to  be  sensed,  perceived,  rec¬ 
ognized,  discriminated,  understood  and  explained  by 
the  agent  itself.  Thus,  a  non-SASE  agent  is  not  able 
to  understand  what  it  is  doing,  or  in  other  words, 
it  is  not  self-aware.  Further,  the  behaviors  that  it 
generates  are  for  the  external  world  only,  not  for  the 
brain  itself.  Thus,  it  is  not  able  to  autonomously 
change  its  internal  decision  steps  either.  For  exam¬ 
ple,  it  is  not  able  to  modify  its  value  system  based  on 
its  experience  about  what  is  good  and  what  is  bad. 

It  is  important  to  note  that  not  all  the  internal 
brain  representations  are  sensed  by  the  brain  itself. 
For  example,  we  cannot  sense  why  we  have  interest¬ 
ing  visual  illusions  (Eagleman,  2001). 


3.2  Autonomous  mental  development 
(AMD) 

An  agent  can  perform  one,  multiple  or  an  open  num¬ 
ber  of  tasks.  The  task  here  is  not  restricted  by  type, 
scope,  or  level.  Therefore,  a  task  can  be  a  subtask  of 
another.  For  example,  making  a  turn  at  a  corner  or 
navigating  around  a  building  can  both  be  a  task. 

To  enable  an  agent  to  perform  certain  tasks, 
the  traditional  paradigm  involves  developing  task- 
specific  architecture,  representation,  and  skills 
through  human  hands,  which  we  call  it  a  “manual” 
development.  The  manual  paradigm  has  two  phases, 
the  manual  development  phase  and  the  automatic 
execution  phase.  In  the  first  phase,  a  human  devel¬ 
oper  H  is  given  a  specific  task  T  to  be  performed 
by  the  machine  and  a  set  of  ecological  conditions  Ec 
about  operational  environment.  The  human  devel¬ 
oper  first  understands  the  task.  Next,  he  designs 
a  task-specific  architecture  and  representation  and 
then  programs  the  agent  A.  In  mathematical  nota¬ 
tion,  we  consider  a  human  as  a  (time  varying)  func¬ 
tion  that  maps  the  given  task  T  and  the  set  of  eco¬ 
logical  conditions  Ec  to  agent  A: 

A  =  H(EC,T).  (1) 

In  the  automatic  execution  phase,  the  machine  is 
placed  in  the  task-specific  setting.  It  operates  by 
sensing  and  acting.  It  may  learn,  using  sensory  data 
to  change  some  of  its  internal  parameters.  However, 
it  is  the  human  who  understands  the  task  and  pro¬ 
grams  the  internal  representation.  The  agent  just 
runs  the  program. 

Correspondingly,  the  autonomous  development 
paradigm  has  two  different  phases,  the  construction 
and  programming  phase  and  the  autonomous  devel¬ 
opment  phase. 

In  the  first  phase,  tasks  that  the  agent  will  end  up 
learning  are  unknown  to  the  robot  programmer.  The 
programmer  might  speculate  some  possible  tasks, 
but  writing  a  task-specific  representation  is  not  pos¬ 
sible  without  actually  given  a  task.  The  ecologi¬ 
cal  conditions  under  which  the  robot  will  operate, 
e.g.,  land-based  or  underseas,  are  provided  to  the 
human  developer  so  that  he  can  design  the  agent 
body  appropriately.  He  writes  a  task-nonspecific 
program  called  developmental  program,  which  con¬ 
trols  the  process  of  mental  development.  Thus  the 
newborn  agent  A(t)  is  a  function  of  a  set  of  ecological 
conditions  only,  but  not  the  task: 

A(  0)  =  H(EC),  (2) 

where  we  added  the  time  variable  t  to  the  time  vary¬ 
ing  agent  A(t),  assuming  that  the  birth  time  is  at 
t  =  0. 

After  the  robot  is  turned  on  at  time  t  =  0,  the 
robot  is  “born”  and  it  starts  to  interact  with  the 
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Figure  4:  Manual  development  paradigm  (a)  and  au¬ 
tonomous  development  (b)  paradigm  . 


physical  environment  in  real  time  by  continuously 
sensing  and  acting.  This  phase  is  called  autonomous 
development  phase.  Human  teachers  can  affect  the 
developing  robot  only  as  a  part  of  the  environment, 
through  the  robot’s  sensors  and  effectors.  After  the 
birth,  the  internal  representation  is  not  accessible  to 
the  human  teacher. 

Various  learning  modes  are  available  to  the  teacher 
during  autonomous  development.  He  can  use  su¬ 
pervised  learning  by  directly  manipulating  (compli¬ 
ant)  robot  effectors  (see,  e.g.,  (Weng  et  al.,  1999)), 
like  how  a  teacher  holds  the  hand  of  a  child  while 
teaching  him  to  write.  He  can  use  reinforcement 
learning  by  letting  the  robot  try  on  its  own  while 
the  teacher  encourages  or  discourages  certain  ac¬ 
tions  by  pressing  the  “good”  or  “bad”  buttons  in 
the  right  context  (see,  e.g.,  (Weng  et  al.,  2000a) 
(Zhang  and  Weng,  2001b)).  The  environment  it¬ 
self  can  also  produce  reward  directly.  For  ex¬ 
ample,  a  “sweet”  object  and  a  “bitter”  one  (see, 
e.g.,  (Almassy  et  al.,  1998)).  With  multiple  tasks  in 
mind,  the  human  teacher  figures  out  which  learn¬ 
ing  mode  is  more  suitable  and  efficient  and  he  typi¬ 
cally  teaches  one  task  at  a  time.  Skills  acquired  early 
are  used  later  by  the  robot  to  facilitate  learning  new 
tasks. 

Fig.  4  illustrates  the  traditional  manual  develop¬ 
ment  paradigm  and  the  autonomous  development 
paradigm. 


3.3  Internal  Representation 

Autonomous  generation  of  internal  representation  is 
central  to  mental  development.  Traditional  AI  sys¬ 
tems  use  symbolic  representation  for  internal  rep¬ 
resentation  and  decision  making.  Is  symbolic  rep¬ 
resentation  suited  for  a  developmental  robot?  In 
the  AI  research,  the  issue  of  representation  has  not 
been  sufficiently  investigated,  mainly  due  to  the  tra¬ 
ditional  manual  development  paradigm.  There  has 
been  a  confusion  of  concepts  in  representation,  espe¬ 
cially  between  reality  and  the  observation  made  by 
the  agents.  To  be  precise,  we  first  define  some  terms. 

A  world  concept  is  a  concept  about  objects  in  the 
external  environment  of  the  agent,  which  includes 
both  the  environment  external  to  the  robot  and  the 
physical  body  of  the  robot.  The  mind  concept 1  is  in¬ 
ternal  with  respect  to  the  nervous  system  (including 
the  brain). 

Definition  2  A  world  centered  representation  is 
such  that  every  item  in  the  representation  corre¬ 
sponds  to  a  world  concept.  A  body  centered  rep¬ 
resentation  is  such  that  every  item  in  the  represen¬ 
tation  corresponds  to  a  mind  concept. 

A  mind  concept  is  related  to  phenomena  observable 
from  the  real  world,  but  it  does  not  necessarily  reflect 
the  reality  correctly.  It  can  be  an  illusion  or  totally 
false. 

Definition  3  A  symbolic  representation  is  about  a 
concept  in  the  world  and,  thus,  it  is  world  centered. 
It  is  in  the  form  A  =  (tq,  V2,  •••,  vn)  where  A  (op¬ 
tional)  is  the  name  token  of  the  object  and  v\,  i>2, 
...,  vn  are  the  unique  set  of  attributes  of  the  object 
with  predefined  symbolic  meanings. 

For  example,  Apple  =  (weight,  color)  is  a  sym¬ 
bolic  representation  of  a  class  of  objects  called  ap¬ 
ple.  Apple-1  =  (0.25g,red)  is  a  symbolic  represen¬ 
tation  of  a  concrete  object  called  Apple-1.  The  set 
of  attributes  is  unique  in  the  sense  that  the  object’s 
weight  is  given  by  the  unique  entry  iq.  Of  course, 
other  attributes  such  as  confidence  of  the  weight  can 
be  used.  A  typical  symbolic  representation  has  the 
following  characteristics: 

1.  Each  component  in  the  representation  has  a  pre¬ 
defined  meaning  about  the  object  in  the  external 
world. 

2.  Each  attribute  is  represented  by  a  unique  variable 
in  the  representation. 

3.  The  representation  is  unique  for  a  single  corre¬ 
sponding  physical  object  in  the  external  environ¬ 
ment. 

The  term  “mind”  is  used  for  ease  of  understanding.  We 
do  not  claim  that  it  is  similar  to  the  human  mind. 


World  centered  symbolic  representation  has  been 
widely  used  in  symbolic  knowledge  representation, 
databases,  expert  systems,  and  traditional  AI  sys¬ 
tems. 

Another  type  of  representation  is  motivated  by  the 
distributed  representation  in  the  biological  brain: 

Definition  4  A  distributed  representation  is  not 
necessarily  about  any  particular  object  in  the  envi¬ 
ronment.  It  is  body  centered,  grown  from  the  body ’s 
sensors  and  effectors.  It  is  in  a  vector  form  A  = 
(vi,V2,—,vn),  where  A  (optional)  denotes  the  vector 
and  Vi,  i  —  1,2,  ...,n  corresponds  to  either  a  sensory 
element  (e.g.,  pixel  or  receptor)  in  the  sensory  input, 
a  motor  control  terminal  in  the  action  output,  or  a 
function  of  them. 

For  example,  suppose  that  an  image  produced  by 
a  digital  camera  is  denoted  by  a  column  vector  I, 
whose  dimension  is  equal  to  the  number  of  pixels  in 
the  digital  image.  Then  /  is  a  distributed  represen¬ 
tation,  and  so  is  /(/)  where  /  is  any  function.  A 
distributed  representation  of  dimension  n  can  repre¬ 
sent  the  response  of  n  neurons. 

The  world  centered  and  body  centered  representa¬ 
tions  are  the  same  only  in  the  trivial  case  where  the 
entire  external  world  is  the  only  single  object  for  cog¬ 
nition.  There  is  no  need  to  recognize  different  objects 
in  the  world.  A  thermostat  is  an  example.  The  com¬ 
plex  world  around  it  is  nothing  more  than  a  tem- 
prature  to  it.  Since  cognition  must  include  discrim¬ 
ination,  cognition  itself  is  not  needed  in  such  a  triv¬ 
ial  case.  Otherwise,  body  centered  representation  is 
very  different  from  a  world  centered  representation. 
Some  later  (later  in  processing  steps)  body  centered 
representations  can  have  a  more  focused  correspon¬ 
dence  to  a  world  concept  in  a  mature  developmental 
robot,  but  they  will  never  be  identical.  For  example, 
the  representation  generated  by  a  view  of  a  red  ap¬ 
ple  is  distributed  over  many  cortical  areas  and,  thus, 
is  not  the  same  as  a  human  designed  atomic,  world 
centered  symbolic  representation. 

A  developmental  program  is  designed  after  the 
robot  body  has  been  designed.  Thus,  the  sensors 
and  effectors  of  the  robot  are  known,  and  so  are  their 
signal  formats.  Therefore,  the  sensors  and  effectors 
are  two  major  sources  of  information  for  generating 
distributed  representation. 

Another  source  of  information  is  the  internal  sen¬ 
sors  and  effectors  which  may  grow  or  die  according 
to  the  autonomously  generated  or  deleted  representa¬ 
tion.  Examples  of  internal  effectors  include  attention 
effectors  in  a  sensory  cortex  and  rehearsal  effectors 
in  a  premotor  cortex.  An  internal  attention  effectors 
are  used  for  turning  on  or  turning  off  certain  signal 
lines  for,  e.g.,  internal  visual  attention.  Rehearsal 
effectors  are  useful  for  planning  before  an  action  is 
actually  released  to  the  motors.  The  internal  sen¬ 
sors  include  those  that  sense  internal  effectors.  In 


fact,  all  the  conscious  internal  effectors  should  have 
corresponding  internal  sensors. 

It  seems  that  a  developmental  program  should  use 
a  distributed  representation,  because  the  tasks  are 
unknown  at  the  robot  programming  time.  It  is  nat¬ 
ural  that  the  representation  in  earlier  processing  is 
very  much  sensor  centered  and  the  representation 
in  later  processing  is  very  much  effector  centered. 
Learned  associations  map  perceptually  very  different 
sensory  inputs  to  the  same  equivalent  class  of  actions. 
This  is  because  a  developmental  being  is  shaped  by 
the  environment  to  produce  such  a  desired  behavior. 

On  the  other  hand,  an  effector  centered  represen¬ 
tation  can  correspond  to  a  world  object  well.  For 
example,  when  the  eyes  of  a  child  sense  (see)  his  fa¬ 
ther’s  portrait  and  his  ears  sense  (hear)  a  question 
“who  is  he?”  The  internally  primed  action  can  be 
any  of  the  following  actions:  saying  “he  is  my  fa¬ 
ther,”  “my  dad,”  “my  daddy,”  etc.  In  this  example, 
the  later  action  representation  can  correspond  to  a 
world  object,  “father,”  but  it  is  still  a  (body  cen¬ 
tered)  distributed  representation.  Further,  since  the 
generated  actions  are  not  unique  given  different  sen¬ 
sory  inputs  of  the  same  object,  there  is  no  place  for 
the  brain  (human  or  robot)  to  arrive  at  a  unique  rep¬ 
resentation  from  a  wide  variety  of  sensory  contexts 
that  reflects  the  world  that  contains  the  same  single 
object  as  well  as  others.  For  example,  there  is  no  way 
for  the  brain  to  arrive  at  a  unique  representation  in 
the  above  “father”  example.  Therefore,  a  symbolic 
representation  is  not  suited  for  a  developmental  pro¬ 
gram  while  a  distributed  representation  is. 

4.  SAIL  -  An  example  of  developmen¬ 
tal  robots 

The  SAIL  robot  is  our  current  autonomous  devel¬ 
opmental  process  test-bed.  It  is  a  human-size  mo¬ 
bile  robot  house-made  at  Michigan  State  University 
with  a  drive-base,  a  six-joint  robot  arm,  a  rotary 
neck,  and  two  pan-tilt  units,  on  which  two  CCD  cam¬ 
eras  (as  eyes)  are  mounted.  A  wireless  microphone 
functions  as  an  ear.  The  SAIL  robot  has  four  pres¬ 
sure  sensors  on  its  torso  and  28  touch  sensors  on  its 
eyes,  arm,  neck,  and  bumper.  Its  main  computer  is 
a  dual-processor  dual-bus  PC  workstation  with  512 
MB  RAM  and  a  27  GB  three-drive  disk  array.  All 
the  sensory  information  processing,  memory  recall 
and  update  as  well  as  real-time  effector  controls  are 
done  in  real-time. 

According  to  the  theory  presented  in  Section  3., 
our  SAIL  developmental  algorithm  has  some  “in¬ 
nate”  reflexive  behaviors  built-in.  At  the  “birth” 
time  of  the  SAIL  robot,  its  developmental  algorithm 
starts  to  run.  This  developmental  algorithm  runs  in 
real  time,  through  the  entire  “life  span”  of  the  robot. 
In  other  words,  the  design  of  the  developmental  pro- 
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Figure  5:  Partial  internal  architecture  of  a  single  level  in 
the  SAIL  developmental  program. 

gram  cannot  be  changed  once  the  robot  is  “born,” 
no  matter  what  tasks  that  it  ends  up  learning.  The 
robot  learns  while  performing  simultaneously.  The 
innate  reflexive  behaviors  enable  it  to  explore  the 
environment  while  improving  its  skills.  The  human 
trainers  train  the  robot  by  interacting  with  it,  very 
much  like  the  way  human  parents  interact  with  their 
infant,  letting  it  seeing  around,  demonstrating  how 
to  reaching  objects,  teaching  commands  with  the  re¬ 
quired  responses,  delivering  reward  or  punishment 
(pressing  “good”  or  “bad”  buttons  on  the  robot), 
etc.  The  SAIL  developmental  algorithm  updates  the 
robot  memory  in  real-time  according  to  what  was 
sensed  by  the  sensors,  what  it  did,  and  what  it  re¬ 
ceived  as  feedback  from  the  human  trainers. 

4-1  Architecture 

The  schematic  architecture  of  a  single  level  of  SAIL  is 
shown  in  Fig.  5.  Sensory  inputs  first  enter  a  module 
called  sensory  mapping,  whose  detailed  structure  is 
discussed  in  Section  4.2. 

Internal  attention  for  vision,  audition  and  touch,  is 
a  very  important  mechanism  for  the  success  of  mul¬ 
timodal  sensing.  A  major  challenge  of  perception  for 
high  dimensional  data  inputs  such  as  vision,  audition 
and  touch  is  that  often  not  all  the  lines  in  the  input 
are  related  to  the  task  at  hand.  Attention  selection 
enables  singles  of  only  a  bundle  of  relevant  lines  are 
selected  for  passing  through  while  others  are  blocked. 
Attention  selection  is  an  internal  effector  since  it  acts 
on  the  internal  structure  of  the  “brain”  instead  of  the 
external  environment. 

First,  each  sensing  modality,  vision,  audition  and 
touch,  needs  intra-modal  attention  to  select  a  sub¬ 
set  of  internal  output  lines  for  further  processing  but 
disregard  to  leaving  unrelated  other  lines.  Second, 
the  inter- modal  attention,  which  selects  a  single  or 
multiple  modalities  for  attention.  Attention  is  neces¬ 
sary  because  not  only  do  our  processors  have  only  a 
limited  computational  power,  but  more  importantly, 
focusing  on  only  related  inputs  enables  powerful  gen¬ 
eralization. 


The  cognitive  mapping  module  is  the  central  part 
of  the  system.  It  is  responsible  for  learning  the  asso¬ 
ciation  between  the  sensory  information,  the  context, 
and  the  behavior.  The  behaviors  can  be  both  exter¬ 
nal  and  internal.  The  external  behaviors  correspond 
to  control  signals  for  external  effectors  such  as  the 
joint  motors  of  a  robot  arm,  or  whatever  peripher¬ 
als  that  the  robot  has  to  act  on  the  environment. 
The  internal  behaviors  include  the  above-mentioned 
attention  selection  signals  for  the  sensory  mapping 
module,  the  effector  that  manipulates  the  internal 
states  and  the  threshold  control  signals  to  the  gating 
system.  The  cognitive  mapping  is  implemented  by 
the  IHDR  tree,  which  mathematically  computes  the 
mapping, 

g  :  S  x  X  — >  S  x  A, 

where  S  is  the  state  (context)  space,  X  is  the  sen¬ 
sory  space,  and  A  is  the  action  space.  IHDR  derives 
the  best  features  that  are  most  relevant  to  output 
by  doing  a  double  clustering  in  both  input  and  out¬ 
put  space.  It  constructs  a  tree  structure  and  repeats 
the  double  clustering  in  a  coarse-to-fine  manner  in 
each  of  the  tree  nodes.  The  resulted  tree  structure  is 
used  to  find  the  best  matching  input  cluster  in  a  fast 
logarithmic  time.  Compared  to  other  methods,  such 
as  artificial  neural  network,  linear  discriminant  anal¬ 
ysis,  and  principal  component  analysis,  IHDR  has 
advantages  in  handling  high-dimensional  input,  do¬ 
ing  discriminant  feature  selection,  and  learning  from 
one  instance. 

The  gating  system  evaluates  whether  the  intended 
action  accumulates  sufficient  thrust  to  be  issued  as 
an  actual  action.  In  this  way,  actions  are  actually 
made  only  when  a  sufficient  number  of  action  prim¬ 
itives  are  given  through  the  time  by  the  cognitive 
mapping  module.  This  mechanism  significantly  re¬ 
duces  the  requirement  on  the  accuracy  of  timing  of 
issued  action  primitives. 

Three  types  of  learning  modes  have  been  imple¬ 
mented  on  SAIL:  learning  by  imitation  (supervised 
learning),  reinforcement  learning,  and  communica¬ 
tive  learning.  In  the  following  sections,  we  explain 
how  learning  is  conducted  by  the  SAIL  robot  while 
providing  the  experimental  results. 

4-2  Staggered  Hierarchical  Mapping 

We  have  designed  and  implemented  a  sensory  map¬ 
ping  method,  called  “Staggered  Hierarchical  Map¬ 
ping  (SHM),”  shown  in  Fig.  6,  and  its  developmen¬ 
tal  algorithm  (Zhang  and  Weng,  2002a).  Its  goal  in¬ 
cludes:  (1)  to  generate  feature  representation  for  re¬ 
ceptive  fields  at  different  positions  in  the  sensory 
space  and  with  different  sizes  and  (2)  to  allow  at¬ 
tention  selection  for  local  processing.  SHM  is  a 
model  motivated  by  human  early  visual  pathways 
including  processing  performed  by  the  retina,  Lat¬ 


eral  Geniculate  Nucleus  (LGN)  and  the  primary  vi¬ 
sual  cortex.  A  new  Incremental  Principal  Compo¬ 
nent  Analysis  (IPCA)  method  is  used  to  automati¬ 
cally  develop  orientation  sensitive  and  other  needed 
filters  (Zhang  and  Weng,  2001a).  From  sequentially 
sensed  video  frames,  the  proposed  algorithm  devel¬ 
ops  a  hierarchy  of  filters,  whose  outputs  are  uncor¬ 
related  within  each  layer,  but  with  increasing  scale 
of  receptive  fields  from  low  to  high  layers.  To  study 
the  completeness  of  the  representation  generated  by 
the  SHM,  we  experimentally  showed  that  the  re¬ 
sponse  produced  at  any  layer  is  sufficient  to  recon¬ 
struct  the  corresponding  “retinal”  image  to  a  great 
degree.  This  result  indicates  that  the  internal  rep¬ 
resentation  generated  for  receptive  fields  at  different 
locations  and  sizes  are  nearly  complete  in  the  sense 
that  it  does  not  lose  important  information.  The  at¬ 
tention  selection  effector  is  internal  and  thus  cannot 
be  guided  from  the  “outside”  by  a  human  teacher. 
The  behaviors  for  internal  effectors  can  be  learned 
through  reinforcement  learning  and  communicative 
learning. 

4-3  Vision-guided  navigation 

In  the  experiment  of  vision-guided  naviga¬ 
tion  (Weng  et  al.,  2000a),  a  human  teacher  teaches 
the  robot  by  taking  it  for  a  walk  along  the  hallways 
of  MSU  Engineering  Building.  Force  sensors  on  the 
robot  body  sense  the  push  action  of  the  teacher  and 
its  two  drive  wheels  complies  by  moving  at  a  speed 
that  is  proportional  to  the  force  that  is  sensed  each 
side.  In  other  words,  the  robot  performs  supervised 
learning  in  real  time  through  imitation. 

The  IHDR  mapping  algorithm  processes  the  input 
image  in  real  time.  It  derives  features  that  are  re¬ 
lated  to  the  action  but  disregard  features  that  are 
not.  The  human  teacher  does  not  need  to  define 
features.  The  system  runs  at  about  10  Hz,  10  up¬ 
dates  of  navigation  decisions  per  second.  In  other 
words,  for  each  100  millisecond,  a  different  set  of  fea¬ 
ture  subspaces  are  used.  To  address  the  requirement 
of  real-time  speed,  the  IHDR  method  incrementally 
constructs  a  tree  architecture  which  automatically 
generates  and  updates  the  representations  in  a  coarse 
to  fine  fashion.  The  real-time  speed  is  achieved  by 
the  logarithmic  time  complexity  of  the  tree  in  that 
the  time  required  to  update  the  tree  for  each  sensory 
frame  is  a  logarithmic  function  in  the  number  of  fine 
clusters  (prototypes)  in  the  tree. 

After  4  trips  along  slightly  different  trajectories 
along  the  hallways,  the  human  teacher  started  to 
let  the  robot  go  free.  He  needed  to  “hand  push” 
the  robot  at  certain  places  when  necessary  until  the 
robot  could  reliably  navigate  along  the  hallway,  with¬ 
out  a  need  for  “hand-lead.”  We  found  that  about  10 
trips  were  sufficient  for  the  SAIL  robot  to  navigate 
along  the  hallways,  using  only  vision,  without  using 
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Figure  6:  The  architecture  of  SHM.  Each  square  denotes  a  neuron.  Layer  0  is  the  input  image.  The  neurons  marked 
as  black  in  layer  1  belong  to  the  same  eigen-group.  Bold  lines  that  are  derived  from  a  single  neuron  and  expanded  to 
the  original  image  mark  the  receptive  field  of  that  neuron.  The  size  of  the  receptive  field  in  a  particular  layer  is  20% 
larger  than  its  previous  layer  in  this  diagram,  which  is  shown  at  the  right.  The  size  of  the  receptive  field  is  rounded  to 
the  nearest  integer.  SHM  allows  not  only  a  bottom  up  response  computation,  but  also  a  top  down  attention  selection. 
The  oval  indicates  the  lines  selected  by  attention  selector. 


any  range  sensors.  Fig.  7  shows  some  images  that 
the  robot  saw  during  the  navigation. 

4-4  Grounded  speech  learning 

Similar  to  learning  vision-guided  navigation,  the 
SAIL  robot  can  learn  to  follow  voice  com¬ 
mand  through  physical  interaction  with  a  human 
trainer  (Zhang  and  Weng,  2001b).  In  the  early  su¬ 
pervised  learning  stage,  a  trainer  spoke  a  command 
to  the  robot  and  then  executed  a  desired  action  by 
pressing  a  pressure  sensor  or  a  touch  sensor  that  was 
linked  to  the  corresponding  effector.  At  later  stages, 
when  the  robot  can  explore  more  or  less  on  its  own, 
the  human  teacher  uses  reinforcement  learning  by 
pressing  its  “good”  or  “bad”  button  to  encourage  and 
discourage  certain  actions.  Typically,  after  about 
15-30-minute  interactions  with  a  particular  human 
trainer,  the  SAIL  robot  could  follow  commands  with 
about  90%  correct  rate.  Table  1  shows  the  voice 
commands  learned  by  the  SAIL  robot  and  its  perfor¬ 
mance.  Fig.  8  shows  the  graphic  user  interface  for 
humans  to  monitor  the  progress  of  online  grounded 
speech  learning. 

4-5  Communicative  learning 

Recently,  we  have  successfully  implemented  the  new 
communicative  learning  mode  on  the  SAIL  robot. 
First,  in  the  language  acquisition  stage,  we  taught 
SAIL  simple  verbal  commands,  such  as  “go  ahead,” 
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Figure  8:  The  GUI  of  AudioDeveloper:  (a)During  online 
learning;  (b)After  online  learning. 

“turn  left,”  “turn  right,”  “stop,”  “look  ahead,”  “look 
left,”  “look  right,”  etc  by  speaking  to  it  online  while 
guiding  the  robot  to  perform  the  corresponding  ac¬ 
tion.  In  the  next  stage,  teaching  using  language,  we 
taught  the  SAIL  robot  what  to  do  in  the  correspond¬ 
ing  context  through  verbal  commands.  For  exam¬ 
ple,  when  we  wanted  the  robot  to  turn  left  (a  fixed 
amount  of  heading  increment),  we  told  it  to  “turn 
left.”  When  we  wanted  it  to  look  left  (also  a  fixed 
amount  of  increment),  we  told  it  to  “look  left.”  This 
way,  we  did  not  need  to  physically  touch  the  robot 
during  training  and  used  instead  much  more  sophis¬ 
ticated  verbal  commands.  This  made  training  more 
efficient  and  more  precise.  Fig.  9  shows  the  SAIL 
robot  navigating  in  real-time  along  the  corridors  of 


Figure  7:  A  subset  of  images  used  in  autonomous  navigation  problem.  The  number  right  below  the  image  shows  the 
needed  heading  direction  (in  degrees)  associated  with  that  image. 


Table  1:  Performance  of  the  SAIL  robot  in  grounded  speech  learning.  After  training,  the  trainer  tested  the  SAIL 
robot  by  guiding  it  through  the  second  floor  of  Engineering  Building.  As  SAIL  did  not  have  perfect  heading  alignment, 
the  human  trainer  used  verbal  commands  to  adjust  robot  heading  during  turns  and  straight  navigation.  During  the 
navigation,  the  arm  and  eye  commands  are  issued  10  times  each  at  different  locations. 


Commands 

Go  left 

Go  right 

Forward 

Backward 

Freeze 

Correct  rate(%) 

88.9 

89.3 

92.8 

87.5 

88.9 

Commands 

Arm  left 

Arm  right 

Arm  up 

Arm  down 

Hand  open 

Correct  rate(%) 

90 

90 

100 

100 

90 

Commands 

Hand  close 

See  left 

See  right 

See  up 

See  down 

Correct  rate(%) 

80 

100 

100 

100 

100 

the  Engineering  Building,  at  a  typical  human  walk¬ 
ing  speed. 

4-6  Action  chaining 

The  capability  of  learning  new  skills  is  very 
important  for  an  artificial  agent  to  scale  up. 
We  have  designed  and  implemented  a  hierarchi¬ 
cal  developmental  learning  architecture  (Fig.  10), 
which  enables  a  robot  to  develop  complex  behav¬ 
iors  (chained  actions)  after  acquisition  of  simple 
ones  (Zhang  and  Weng,  2002b).  The  mechanism 
that  makes  this  possible  is  chained  secondary  con¬ 
ditioning.  An  action  chaining  process  can  be  written 
mathematically  as, 

Cc  — >  Cs\  — >  As i  — >  CS2  — *  As2  =>  Cc  — >  As i  — >  As 2 

(3) 

where  Cc  is  the  composite  command,  Cs\  and  Cs 2 
are  commands  invoking  basic  actions  As  1  and  As 2, 
respectively.  — >  means  “followed  by”,  and  means 
“develops”.  The  problem  here  is  that  Cs 1  and  Cs2 
are  missing  in  the  developed  stimuli-response  asso¬ 
ciation.  The  major  challenge  of  this  work  is  that 
training  and  testing  must  be  conducted  in  the  same 
mode  through  online  real-time  interactions  between 
the  robot  and  the  trainer. 

In  the  experiment,  upon  learning  the  basic  gripper 


tip  movements  (Fig.  11),  the  SAIL  robot  learned  to 
combine  individually  instructed  movements  to  be  a 
composite  one  invoked  by  a  single  verbal  command 
without  any  reprogramming  (Fig.  12).  To  solve  the 
problem  of  missing  context  in  action  chaining,  we 
modeled  a  primed  context  as  the  follow-up  sensation 
and  action  of  a  real  context.  By  backpropagating 
the  primed  context,  a  real  context  was  able  to  pre¬ 
dict  future  contexts,  which  enabled  the  agent  to  re¬ 
act  correctly  even  with  some  missing  contexts.  The 
learning  strategy  integrated  supervised  learning  and 
reinforcement  learning.  To  handle  the  “abstraction” 
issue  in  real  sensory  inputs,  a  multi-level  architecture 
was  used  with  the  higher  level  emulating  the  function 
of  higher-order  cortex  in  biology  in  some  sense. 

5.  Value  system 

A  value  system  of  a  robot  enables  the  robot  to  know 
what  is  bad  and  what  is  good,  and  to  act  for  the 
good.  Without  a  value  system,  a  robot  either  does 
nothing  or  does  every  move  mechanically  and  thus 
lacks  intelligence.  We  have  designed  and  imple¬ 
mented  a  low  level  value  system  for  the  SAIL  robot. 
The  value  system  integrates  the  habituation  mecha¬ 
nism  and  reinforcement  learning  so  that  the  robot’s 
responses  to  certain  visual  stimuli  would  change  after 


Figure  9:  SAIL  robot  navigates  autonomously  using  its  autonomously  developed  visual  perceptual  behaviors.  Four 
movies  are  available  at  http://www.egr.msu.edu/mars/  to  provide  more  results. 


Figure  12:  The  SAIL  robot  learned  the  chained  action  after  verbally  instructed  by  human  trainers. 


Figure  10:  A  hierarchical  developmental  learning  archi¬ 
tecture  for  action  chaining. 


interacting  with  human  trainers.  For  more  details, 
the  reader  is  referred  to  another  paper  in  the  pro¬ 
ceeding  of  this  workshop  (Huang  and  Weng,  2002). 

6.  Comparison  with  others’  work 

What  is  the  most  basic  difference  between  a  tradi¬ 
tional  learning  algorithm  and  a  developmental  algo¬ 
rithm?  Autonomous  development  does  require  a  ca¬ 
pability  of  learning  but  it  requires  something  more 
fundamental.  A  developmental  algorithm  must  be 


Figure  11:  Gripper  tip  trajectories  of  the  SAIL  robot. 
(a)-(d)  are  basic  actions,  each  of  which  starts  from  the 
black  dot.  (e)-(g)  are  composite  actions  chaining  some 
or  all  of  the  basic  ones. 


able  to  learn  tasks  that  its  programmers  do  not 
know  or  even  cannot  predict.  This  is  because  a  de¬ 
velopmental  algorithm,  once  designed  before  robot 
“birth,”  must  be  able  to  learn  new  tasks  and  new 
skills  without  requiring  re-programming.  The  rep¬ 
resentation  of  a  traditional  learning  algorithm  is  de¬ 
signed  by  humans  for  a  given  task  but  that  for  a  de¬ 
velopmental  algorithm  must  be  autonomously  gener¬ 
ated.  As  a  working  example,  humans’  developmental 
algorithm  enables  humans  to  learn  new  skills  without 
a  need  to  change  the  design  of  their  brain. 

However,  the  motive  of  developmental  robots  is 
not  to  make  robot  more  difficult  to  program,  but  rel- 


atively  easier  instead.  The  task  nonspecific  nature  of 
a  developmental  program  is  a  blessing.  It  relieves 
human  programmers  from  the  daunting  tasks  of 
programming  task-specific  visual  recognition,  speech 
recognition,  autonomous  navigation,  object  manip¬ 
ulation,  etc,  for  unknown  environments.  The  pro¬ 
gramming  task  for  a  developmental  algorithm  con¬ 
centrates  on  self-organization  schemes,  which  are 
more  manageable  by  human  programmers  than  the 
above  task-specific  programming  tasks  for  unknown 
or  partially  unknown  environments. 

Designing  and  implementing  a  developmental  pro¬ 
gram  are  systematic,  clearly  understandable  using 
mathematical  tools.  Designing  a  perception  program 
and  its  representation  in  a  task-specific  way  using  a 
traditional  approach,  however,  is  typically  very  com¬ 
plex,  ad  hoc,  and  labor  intensive.  The  resulting  sys¬ 
tem  tends  to  be  brittle.  Design  and  implementa¬ 
tion  of  a  developmental  program  are  of  course  not 
easy.  However,  the  new  developmental  approach  is 
significantly  more  tractable  than  the  traditional  ap¬ 
proaches  in  programming  a  perception  machine.  Fur¬ 
ther,  it  is  applicable  to  uncontrolled  real-world  envi¬ 
ronments,  the  only  approach  that  is  capable  of  doing 
this. 

Due  to  its  cross-environment  capability,  the  SAIL 
robot  has  demonstrated  vision-guided  autonomous 
navigation  capability  in  both  complex  outdoor 
and  indoor  environments.  The  Hierarchical  Dis¬ 
criminant  Regression  (HDR)  engine  played  a  cen¬ 
tral  role  in  this  success.  Although  ALVINN  at 
CMU  (Pomerleau,  1989)  can  in  principle  be  applied 
to  indoor,  however  the  local  minima  and  loss  of  mem¬ 
ory  problem  with  artificial  intelligence  make  it  very 
difficult  to  work  in  the  complex  indoor  scenes. 

The  SAIL  robot  has  successfully  developed  real¬ 
time,  integrated  multimodal  (vision,  audition,  touch, 
keyboard  and  via  wireless  network)  human-robot  in¬ 
teraction  capability,  to  allow  a  human  operator  to 
enter  different  degrees  of  intervention  seamlessly.  A 
basic  reason  for  achieving  this  extremely  challeng¬ 
ing  capability  is  that  the  SAIL  robot  is  developed 
to  associate  over  tens  of  thousands  of  multi-modal 
contexts  in  real-time  in  a  grounded  fashion,  which  is 
another  central  idea  of  AMD.  Some  behavior-based 
robots  such  as  Cog  and  Kismet  at  MIT  do  online  in¬ 
teractions  with  humans,  but  they  are  off-line  hand 
programmed.  They  cannot  interact  with  humans 
while  learning. 

The  perception-based  action  chaining  allows  the 
SAIL  robot  to  develop  complex  perception-action  se¬ 
quences  (or  behaviors)  from  simple  perception-action 
sequences  (behaviors)  through  real-time  online  hu¬ 
man  robot  interactions,  all  are  done  in  the  same 
continuous  operational  mode.  This  capability  ap¬ 
pears  simpler  than  it  really  is.  The  robot  must 
infer  about  context  in  high-dimensional  perception 


vector  space.  It  generates  new  internal  representa¬ 
tion  and  uses  it  for  later  context  prediction,  which 
is  central  for  scaling  up  in  AMD.  David  Touresky’s 
skinnerbot  (Touretzky  and  Saksida,  1999)  does  ac¬ 
tion  chaining,  but  it  does  it  through  preprogrammed 
symbols  and  thus  the  robot  is  not  applicable  to  un¬ 
known  environments. 

7.  Conclusion 

For  a  robot,  every  action  is  context  dependent,  i.e.,  it 
is  tightly  dependent  on  the  rich  information  available 
in  the  sensory  input  and  the  state.  The  complexity 
of  the  rules  of  such  context  dependence  is  beyond 
human  programming,  which  is  one  of  the  fundamen¬ 
tal  reasons  that  traditional  ways  have  been  proved 
to  be  extremely  difficult  to  develop  robots  running 
in  a  typical  human  environment. 

We  introduced  here  a  new  kind  of  robots  -  develop¬ 
mental  robots  that  can  develop  their  mental  skills  au¬ 
tomatically  through  real-time  interactions  with  the 
environment.  Motivated  by  human  mental  develop¬ 
ment  from  infancy  to  adulthood,  the  proposed  the¬ 
oretical  framework  have  been  proved  on  the  SAIL 
robot  in  multiple  tasks,  from  vision-guided  naviga¬ 
tion,  grounded  speech  learning,  to  behavior  scale- 
up  through  action  chaining,  all  learned  and  per¬ 
formed  online  in  real  time.  The  main  reason  be¬ 
hind  this  achievement  is  that  the  robot  does  not  rely 
on  human  to  pre-define  representation.  The  repre¬ 
sentation  of  the  system  is  automatically  generated 
through  the  interaction  between  the  developmental 
mechanism  and  the  experience.  We  believe  what  we 
have  achieved  is  a  starting  point  of  the  promising  new 
direction  of  robotics.  While  there  are  yet  plenty  of 
practical  questions  waiting  for  us  to  answer,  it  opens 
a  wide  range  of  opportunities  for  future  research. 
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